This project is an extension of the analysis of my facebook ego network as worked on in class. In addition to the in-depth analysis of the personal network, I also carried out an analysis between a few anomalyzed ego networks.
My personal facebook ego network data is obtained through the Getnet app with instructions posted in assignment 1 of the SNA class. The data is downloaded in .gml format.
The other anomalyzed data sets were obtained from Stanford’s collection of social network data. There are 10 anomalyzed ego network data in total. We make use of these by comparing the properties with my personal network (McAuley and Leskovec 2012).
The nodes represent the friends from the corresponding individual’s facebook ego network data. The edges represent the mutual friend relationships. I believe there may have been some munging/filtering of data in the stanford ego networks.
In addition to the raw data from my facebook ego network, I’ve manually inserted “test” labels of how I met each individual friend in my network. Part of the goal of this project is to see whether different community detection algorithms could pick up the correct clusters corresponding to how I’ve met them in person. For example, from playing Ultimate Frisbee, high school, UBC stats department, etc.
In addition, the idea of including additional ego network data came from Dr. Lada Adamic’s reply in this thread. Therefore, the ensemble of ego network is usually of interest also.
The following step-by-step code makes use of some helper functions created to generate statistics. It is not included here to minimize visual clutter.
Load in the following packages and helper functions
library(igraph)
library(RColorBrewer)
library(plyr)
library(dplyr)
library(xtable)
library(gridExtra)
source("count.degree.distribution.R")
source("computeStats.R")
Read in the Graph
fbNetworkFile = "facebook_with_relation.gml"
G = read.graph(file=fbNetworkFile, format="gml")
The “test labels” I mentioned earlier are called relations, that is, how I first met a particular friend. Assign a colour to each relation.
pairedColors = c(brewer.pal(n=12, name="Paired"))
names(pairedColors) = c("ultimate-rec", "ultimate-competitive", "ubc-cpsc",
"ubc-stat", "ubc-event", "closest-friends", "soccer",
"hockey", "toys-r-us", "environment-canada",
"high-school", "relative")
V(G)$color = revalue(V(G)$relation, pairedColors)
Compute some graph statistics on personal network
outGraphStats = computeGraphStats(G)
Average shortest path: This is the average number of steps along the shortest paths for all possible pairs of network nodes.
Cluster coefficient: Degree to which nodes in a graph tend to cluster together.
Local Cluster coefficient: The fraction pairs of neighbours of the node that are themselves connected. This is the local cluster coefficient for myself.
Let’s compare this with a erdos-renyi random graph to see how it compares with my ego network
First simulate an erdos-renyi random graph is same number of nodes and edges
erdo = erdos.renyi.game(length(V(G)), p.or.m=length(E(G)), type="gnm")
Compute some graph statistics on erdos-renyi graph
gsErdo = computeGraphStats(erdo)
Interpretation Both the average shortest path and cluster coefficient for the ego network is both higher than the random graph which is interesting. The ego network has several dense clusters, and many cliques within clusters as we shall see later. Therefore, the cluster coefficient is high. Since, there are nodes with very few links may have contributed to the higher average shortest path.
Here, we compute individual statistics about the nodes in the graphs. I’ll only only be computing these for a few people from every relation (or “test” labels).
Degree: The number of mutual friends.
Betweeness: The number of shortest paths from all the nodes to all others that pass through that node.
Closeness: The length of the average shortest path between a node and all other nodes in the network.
Pick some names to analyze
nameList = c("Yuji Aizawa", "Jasper Lu", "Rhona Yue", "Kevin Underhill",
"Esther Fann", "Sean Montgomery", "Tyki Sueyoshi", "Louisa Lau",
"Ellery Lee", "Jonathan Baik", "Alex Tan", "Andrew Brear",
"Angela S", "Simon Tai")
Compute individual statistics
nodeStats = computeNodeStats(G)
nodeStats = cbind("name"=V(G)$label, "relation"=V(G)$relation, nodeStats)
nodeStats = data.frame(nodeStats)
nodeStats = nodeStats %>% filter(name %in% nameList) %>%
arrange(desc(betweenness))
print(xtable(nodeStats), comment=F, type="html", include.rownames=F)
| name | relation | degree | betweenness | closeness |
|---|---|---|---|---|
| Yuji Aizawa | closest-friends | 107.00 | 4702.97 | 0.00 |
| Jasper Lu | ultimate-rec | 88.00 | 1040.79 | 0.00 |
| Ellery Lee | closest-friends | 27.00 | 508.51 | 0.00 |
| Rhona Yue | ubc-event | 20.00 | 405.73 | 0.00 |
| Esther Fann | environment-canada | 13.00 | 389.40 | 0.00 |
| Tyki Sueyoshi | ubc-stat | 8.00 | 240.74 | 0.00 |
| Simon Tai | ubc-stat | 37.00 | 204.92 | 0.00 |
| Sean Montgomery | ultimate-competitive | 67.00 | 150.26 | 0.00 |
| Jonathan Baik | environment-canada | 9.00 | 101.45 | 0.00 |
| Kevin Underhill | ultimate-competitive | 56.00 | 63.87 | 0.00 |
| Andrew Brear | ultimate-competitive | 28.00 | 16.73 | 0.00 |
| Alex Tan | ubc-cpsc | 4.00 | 0.05 | 0.00 |
| Angela S | toys-r-us | 2.00 | 0.00 | 0.00 |
| Louisa Lau | relative | 2.00 | 0.00 | 0.00 |
Interpretation
The table is ordered by highest betweenness. Yuji Aizawa is one of my closest friends. It appears that he has both the highest number of mutual friends and betweenness. This table also shows that a high number of degree does not imply that betweenness is high. Moreover, my closer friends have generally high betweeness.
First filter out the labels to show only the names of interest
V(G)$label.cex = 1
labelsG = V(G)$label
labelsG[!(labelsG %in% nameList)] = NA
**The layout below is a great default layout for large graphs. Here is where I found it. In addition, the betweenness property is encoded by node size. Whereas, the relation is encoded with colour.
opar <- par()$mar; par(mar=rep(0, 4))
layout <- layout.fruchterman.reingold(G, niter=500, area=vcount(G)^2.3,
repulserad=vcount(G)^2.8)
myPlot = plot(G, layout=layout, vertex.size=log(betweenness(G) + 1),
vertex.label=labelsG, vertex.label.color="black")
legendLabels = unique(V(G)$relation)
legendColours = unique(V(G)$color)
legend("topleft", legend=legendLabels, col=legendColours, pch=19,
bty="n", cex=.8)
Interpretation
The algorithm layout (force-directed type?) algorithm did a fairly good job at placing the nodes on the graph in respect to the true relation labels. That is to say, we can clearly discriminate the different groups. Of course, there may have been a small bias while manually labelling the points. However, most relation reflect my first encounter with the person, so the bias is minimized in that respect. What stood out to me was, Simon Tai appears to be clustered with my friends whom play ultimate frisbee. However, I know him because of my relation with him within the stats department at ubc. It turns out he plays ultimate frisbee recreationally and has a lot of friends in common that also plays ultimate.
Now let’s take a look at what the erdos-renyi random graph look like.
labelsG = NA
erdo = erdos.renyi.game(length(V(G)), p.or.m=length(E(G)), type="gnm")
erdoPlot = plot(erdo, layout=layout, vertex.size=log(betweenness(erdo)+1),
vertex.label=labelsG)
title("Erdos-Renyi Random Graph")
We’ll try 2 different community detection algorithms: modularity and walk-trap. Modularity algorithm considers edges that fall within a community or between a community and the rest of the network. Walk-trap algorithm find communities through random walks. The idea behind walk-trap is that short random walks tend to stay in the same community. (This will count towards having tried out a method not introduced in class)
Execute community finding algorithms for personal network and erdos-renyi graphs
mc = fastgreedy.community(G)
mcErdo = fastgreedy.community(erdo)
wtc = walktrap.community(G, steps=4)
wtcErdo = walktrap.community(erdo, steps=4)
Plotting details are omitted here, but the code can be found in plotCommunites.R
Modularity score community finding algorithm|
My Ego Network |
Erdos-Renyi |
|
My Ego Network |
Erdos-Renyi |
Interpretation The erdos-renyi graph does not exhibit any interesting communities after performing the modularity and walk-trap algorithms. However, the ego network definitely has out some interesting communities. In particular the walk-trap algorithm detected the cluster people in the ego network that plays ultimate. Moreover, it grouped together my closest friends with the people who I met at a ubc event. It found the group of people I worked at toys-r-us. It trivially detected my relatives, since they have no connection with any other person in the graph.
# compute graph statistics similarly as for my ego network
getGraphStatsEgoS = function(egoStanfordFile){
egoS = read.graph(egoStanfordFile, directed=F)
egoS = simplify(egoS, remove.multiple=T, remove.loops=T)
egoSstr = sub("\\.edges", "", basename(egoStanfordFile))
out = data.frame("network"=paste0("stanford-ego_",egoSstr),
as.data.frame(computeGraphStats(egoS)))
return(out)
}
# find stanford anomalyzed ego networks
egoStanfordFiles = list.files("facebook_stanford", "edges", full.names=T)
graphStatsDf = ldply(egoStanfordFiles, .fun=getGraphStatsEgoS)
# combine the graph stats information in a dataframe
temp = rbind(as.data.frame(c("network"="my-ego_network", outGraphStats)),
as.data.frame(c("network"="erdo_network", gsErdo)))
graphStatsDf = rbind(temp, graphStatsDf)
graphStatsDf = graphStatsDf %>% arrange(desc(transitivity))
| network | avgShortestPath | transitivity | localClusterCoefG |
|---|---|---|---|
| stanford-ego_1912 | 2.56 | 0.70 | 0.01 |
| stanford-ego_698 | 1.93 | 0.66 | 0.00 |
| stanford-ego_414 | 2.69 | 0.65 | 0.01 |
| my-ego_network | 2.35 | 0.55 | 0.13 |
| stanford-ego_107 | 2.95 | 0.50 | 0.01 |
| stanford-ego_348 | 2.52 | 0.49 | 0.02 |
| stanford-ego_686 | 2.43 | 0.45 | 0.00 |
| stanford-ego_1684 | 3.04 | 0.45 | 0.00 |
| stanford-ego_3980 | 2.55 | 0.45 | 0.00 |
| stanford-ego_3437 | 3.45 | 0.45 | 0.00 |
| stanford-ego_0 | 3.75 | 0.43 | 0.04 |
| erdo_network | 1.88 | 0.13 | 0.13 |
Interpretation The table was arranged with respect to the transitivity, also known as clustering coefficient. It turns out that my ego network is third highest in terms of transivity. All the ego networks had much higher transitivity levels than the erdos-renyi random graph. Similary, the average shortest path is also all greater than the random graph. In terms of average shortest path and transitivity, my ego network fit right in with the stanford samples.
Community structure finding is carried out on the 8 sample ego network from the stanford database. We used both the modularity score and walk-trap algorithm. The figures are shown below.
Modularity score community finding algorithm|
My Ego Network |
Sample Network |
Sample Network |
Sample Network |
|
Sample Network |
Sample Network |
Sample Network |
Sample Network |
|
Sample Network |
Sample Network |
Erdo-Renyi |
|
|
My Ego Network |
Sample Network |
Sample Network |
Sample Network |
|
Sample Network |
Sample Network |
Sample Network |
Sample Network |
|
Sample Network |
Sample Network |
Erdo-Renyi |
|
Interpretation
It appears that the number of communities detected varies between the sample ego networks. The second network from the top left has the most similar community structure as my ego network. However, there are a lot of communities amongst the other networks. The third network from the top appears to have a giant componenent and many small cliques suggesting that there is a large group of friends with many connections as well as many small groups of friends.
McAuley, J., and J. Leskovec. 2012. “Learning to Discover Social Circles in Ego Networks.” NIPS.